In my work I support a couple of companies that are engaged in distribution of one kind or another. The distribution business is generally about moving boxes from point A to point B. A group of boxes going from one shipper location to a final consignee is called a shipment or a consignment. It is important to ensure that every box within the consignment is delivered, and to spot cases where boxes (components of a shipment) haven’t yet been passed through a particular checkpoint. During the lifetime of a consignment within the distribution network, each consignment goes through a sequence of statuses. For example, there’s one called “MDE” which is the time at which information from the consignment note (customer, addresses, etc.) is encoded into the application. Many of the consignment status codes are recorded by scanning a bar code, usually on a package, so there are two types of status: package and consignment level. Included in the package’s bar code is a unique package number within the consignment, e.g., 4 of 9.
When a consignment enters or leaves a distribution center, it is a general practice to scan each package to a different status code. The application contains various forms for tracking the status of a shipment. When the form displays a consignment with many packages, this can result in a huge number of scans (rows) in the history, so in practice what is done is to simply construct a delimited list of package numbers that were each scanned to a status.
When you’ve got 999 packages in a consignment, this will still result in a pretty long list of package numbers.
Both in commerce and science, we already have a way of representing ordered lists of ‘broken sequences.’ This allows us to describe islands of unbroken sequences economically as, for example, ‘2-8, 10-16, 18-24, 26-32, 34-40, 42-48. Even better, we can intermix these with the usual delimited list. ‘1, 2, 5-9’. The unbroken sequence from 1 to 100 would be very economically rendered as ‘1-100’, and it is easy to understand.
Sometimes it makes sense to use this representation of a sequence within an application. How can we handle this in SQL, turning it into a form that is easily manipulated by that relational system, and then back again into a text-based, delimited list? I’ll be showing you how in this article. I’ll first suggest how to condense a delimited list of integer package numbers into a list of unbroken ‘islands,’ show how to determine when these islands are missing packages and then go in the other direction by ‘exploding’ a condensed list.
If you’re a SQL enthusiast, you may want to look ahead to “The Final Word” section to see what algorithms will be combined in this article’s examples.
Data Setup
In order to explain and demonstrate the concept of condensing a delimited list, we’ll need to start by creating some sample data. For this, we’ll create two tables:
- The first will contain a row set where the primary key the consignment number (a “class” of items) and also the package (or item) number within the class.
- The second table will contain the same consignment number key but in its second column will be a delimited list of the packages that appeared as separate rows in our first table.
Let’s create and populate the first table (#ConsignmentHistory). For the purposes of our examples, we’re ignoring the status code that we mentioned above, or really what we’re doing is assuming that all of these examples are scans to the same status code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
-- Create our Consignment History table CREATE TABLE #ConsignmentHistory ( ConsignmentNo VARCHAR(20), PkgNo INT, PRIMARY KEY (ConsignmentNo, PkgNo) ); -- Insert some small sequences of packages INSERT INTO #ConsignmentHistory SELECT '2322',1 UNION ALL SELECT '2322',3 UNION ALL SELECT '2322',4 UNION ALL SELECT '2343',1 UNION ALL SELECT '2343',2 UNION ALL SELECT '2343',5 UNION ALL SELECT '2343',6 UNION ALL SELECT '7611',1 UNION ALL SELECT '7611',2 UNION ALL SELECT '7611',4 UNION ALL SELECT '7611',6 UNION ALL SELECT '5195',7 UNION ALL SELECT '5195',1 UNION ALL SELECT '5195',2 UNION ALL SELECT '5195',5; -- Insert a longer sequence of packages with some gaps WITH Tally (n) AS ( SELECT number FROM [master].dbo.spt_values WHERE [Type] = 'P' AND Number BETWEEN 1 AND 50) INSERT INTO #ConsignmentHistory SELECT '4385', n FROM Tally WHERE 1 <> n%8; -- Insert the longest sequence of packages with no gaps WITH Tally (n) AS ( SELECT number FROM [master].dbo.spt_values WHERE [Type] = 'P' AND Number BETWEEN 1 AND 100) INSERT INTO #ConsignmentHistory SELECT '6377', n FROM Tally; SELECT * FROM #ConsignmentHistory ORDER BY ConsignmentNo, PkgNo; |
Note that for each of the consignment numbers except the last, there exist some gaps in the package numbers. We can use the familiar FOR XML PATH method within a correlated sub-query to populate our second table, where the second column contains an ordered delimited list of the PkgNos.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
-- A second table to store our de-normalized views of packages CREATE TABLE #ConsHistoryDelimited ( ConsignmentNo VARCHAR(20) PRIMARY KEY, PkgNo VARCHAR(8000), StartPkgNo INT, EndPkgNo INT, CondensedPkgNos VARCHAR(8000), MissingPkgNos VARCHAR(8000) ); -- Construct the delimited list using the standard FOR XML PATH method INSERT INTO #ConsHistoryDelimited (ConsignmentNo, PkgNo, StartPkgNo, EndPkgNo) SELECT ConsignmentNo, STUFF(( SELECT ', ' + CAST(PkgNo AS VARCHAR(5)) FROM #ConsignmentHistory b WHERE a.ConsignmentNo = b.ConsignmentNo ORDER BY b.ConsignmentNo FOR XML PATH('')), 1, 2, '') -- Ignore these for now (we'll find a use for them later) ,MIN(PkgNo), MAX(PkgNo) FROM #ConsignmentHistory a GROUP BY ConsignmentNo; SELECT ConsignmentNo, PkgNo FROM #ConsHistoryDelimited WHERE ConsignmentNo IN ('2322','2343','7611','5195'); |
For the time being, we will ignore the last 4 columns in the #ConsHistoryDelimited table, but we will be using them later. We can see that for the SELECTed results, our delimited lists are quite short:
1 2 3 4 5 |
ConsignmentNo PkgNo 2322 1, 3, 4 2343 1, 2, 5, 6 5195 1, 2, 5, 7 7611 1, 2, 4, 6 |
However if we were to examine the results for the last 2 consignments, you’d see that those lists are quite long.
Creating a Condensed List of Delimited Integer Items
One objective of this article is to take a delimited list of integers such as the ones we’ve stored in #ConsHistoryDelimited and condense them, in order to reduce the length of the string. Why would we want to do such a thing you might ask? A fair enough question, but first let’s start with our desired results set.
1 2 3 4 5 6 7 |
ConsignmentNo PkgNo 2322 1, 3-4 2343 1-2, 5-6 5195 1-2, 4, 6-7 7611 1-2, 5 4385 2-8, 10-16, 18-24, 26-32, 34-40, 42-48, 50 6377 1-100 |
Because our first few lists are quite short to begin with, we haven’t achieved much of a reduction in length. We do see a difference for consignment number 7611 and a significant difference for consignment numbers 4385 and especially 6377. Such a shortening of the list of items could be quite useful if you need to display that list in your application’s front end. The condensed version may actually be easier for a viewer to grasp, especially if they want to know whether the list is complete or not. We have specifically included a space after the commas, so that when you display the delimited list in a form, the long list of items can be wrapped to multiple lines easily.
In order to condense our list, we must draw upon the concept of Gaps and Islands, but more specifically we need to group ranges (islands) of contiguous package numbers, similar to the way SQL MVPJeff Moden demonstrated you can Group Islands of Contiguous Dates. This technique was also demonstrated (possibly pre-dating Jeff’s article) by SQL MVP Itzik Ben-Gan in Chapter 5 of the book SQL Server MVP Deep Dives. Let’s return to our first table and calculate the islands utilizing this approach, which I’ve also heard referred to as the “staggered rows” approach to calculating islands:
1 2 3 4 5 6 7 8 9 10 11 12 |
-- Display the "islands" (ranges) of contiguous packages for each consignment WITH Islands AS ( SELECT ConsignmentNo, StartPkgNo=MIN(PkgNo), EndPkgNo=MAX(PkgNo) FROM ( SELECT ConsignmentNo, PkgNo -- This rn represents the "staggered rows" ,rn=PkgNo-ROW_NUMBER() OVER (PARTITION BY ConsignmentNo ORDER BY PkgNo) FROM #ConsignmentHistory) a GROUP BY ConsignmentNo, rn) SELECT ConsignmentNo, StartPkgNo, EndPkgNo FROM Islands ORDER BY ConsignmentNo, StartPkgNo; |
We chose this approach because it is pretty efficient and after a bit of study, reasonably easy to grasp conceptually. Here are the results:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
ConsignmentNo StartPkgNo EndPkgNo 2322 1 1 2322 3 4 2343 1 2 2343 5 6 4385 2 8 4385 10 16 4385 18 24 4385 26 32 4385 34 40 4385 42 48 4385 50 50 5195 1 2 5195 5 5 5195 7 7 6377 1 100 7611 1 2 7611 4 4 7611 6 6 |
To construct our condensed list of package numbers, we need to combine the above Islands with the FOR XML PATH technique to generate the delimited list that is our desired results set. The only thing different than when we used FOR XML PATH when setting up the #ConsHistoryDelimited, is to CASE the islands that consist of a single PkgNo as a single integer rather than ending up with something like 3-3, as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
-- Using the islands, put the delimited list with hyphenated package number -- ranges into our table WITH Islands AS ( SELECT ConsignmentNo, StartPkgNo=MIN(PkgNo), EndPkgNo=MAX(PkgNo) FROM ( SELECT ConsignmentNo, PkgNo -- This rn represents the "staggered rows" ,rn=PkgNo-ROW_NUMBER() OVER (PARTITION BY ConsignmentNo ORDER BY PkgNo) FROM #ConsignmentHistory) a GROUP BY ConsignmentNo, rn) UPDATE a SET CondensedPkgNos=STUFF(( SELECT ', ' + CASE -- Include either a single Item or the range (hyphenated) WHEN StartPkgNo = EndPkgNo THEN CAST(StartPkgNo AS VARCHAR(5)) ELSE CAST(StartPkgNo AS VARCHAR(5)) + '-' + CAST(EndPkgNo AS VARCHAR(5)) END FROM Islands b WHERE a.ConsignmentNo = b.ConsignmentNo ORDER BY StartPkgNo FOR XML PATH('')), 1, 2, '') FROM #ConsHistoryDelimited a; |
We have chosen to UPDATE the corresponding column in our #ConsHistoryDelimited table with the condensed list of package numbers. Examination of the SELECTed results shows they are identical to the correct results we wanted to see!
Condensing a List of Delimited Items
Perhaps we already have the delimited list of items to start with and we simply want to condense it. Now that we have an inkling of how to do it, we can easily use Jeff Moden’s “community,” delimited-string splitter, the well-known and very popular, best-of-breed DelimitedSplit8K FUNCTION. In order to run the next example, you’ll need to download and install it from the linked article.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
-- Starting with the Delimited list, construct the condensed delimited list WITH Islands AS ( SELECT ConsignmentNo, StartPkgNo=MIN(PkgNo), EndPkgNo=MAX(PkgNo) FROM ( SELECT ConsignmentNo, c.PkgNo ,rn=c.PkgNo-ROW_NUMBER() OVER (PARTITION BY ConsignmentNo ORDER BY c.PkgNo) -- Changes to this CTE start here FROM #ConsHistoryDelimited a CROSS APPLY dbo.DelimitedSplit8K(a.PkgNo, ',') b CROSS APPLY (SELECT CAST(Item AS INT)) c(PkgNo) -- And end here ) a GROUP BY ConsignmentNo, rn) SELECT ConsignmentNo, a.PkgNo, CondensedPkgNos=STUFF(( SELECT ', ' + CASE -- Include either a single Item or the range (hyphenated) WHEN StartPkgNo = EndPkgNo THEN CAST(StartPkgNo AS VARCHAR(5)) ELSE CAST(StartPkgNo AS VARCHAR(5)) + '-' + CAST(EndPkgNo AS VARCHAR(5)) END FROM Islands b WHERE a.ConsignmentNo = b.ConsignmentNo ORDER BY StartPkgNo FOR XML PATH('')), 1, 2, '') FROM #ConsHistoryDelimited a; |
Note that only a very minor modification (look for the comment “Changes to this CTE start here”) was required to the Islands Common Table Expression (CTE) to use the delimited list stored in #ConsHistoryDelimited and split that string using DelimitedSplit8K. The Cascaded CROSS APPLY is only there to avoid a few extra CASTs of Item (column output by DelimitedSplit8K) to INT.
Another Case to Consider
So what’s missing? More specifically, perhaps we’d like to see a condensed list of those package numbers that are missing for each consignment. That is the other half of the Gaps and Islands class of problems. We’d like to know what the gaps are. Any of the many methods for calculating gaps will do, but we’ll choose a somewhat obscure method of calculating gaps from islands using CROSS APPLY VALUES. Using the Islands CTE that we already have, we can construct a query that will deliver a condensed delimited list of both Gaps and Islands:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
-- Our Islands CTE creates the ranges of package numbers WITH Islands AS ( SELECT ConsignmentNo, StartPkgNo=MIN(PkgNo), EndPkgNo=MAX(PkgNo) FROM ( SELECT ConsignmentNo, PkgNo -- This rn represents the "staggered rows" ,rn=PkgNo-ROW_NUMBER() OVER (PARTITION BY ConsignmentNo ORDER BY PkgNo) FROM #ConsignmentHistory) a GROUP BY ConsignmentNo, rn) -- Convert the Islands to Gaps (missing) packages using CROSS APPLY VALUES ,Islands2Gaps AS ( SELECT ConsignmentNo, GapStartPkgNo=MIN(PkgNo), GapEndPkgNo=MAX(PkgNo) FROM ( SELECT ConsignmentNo, PkgNo ,rn=ROW_NUMBER() OVER (PARTITION BY ConsignmentNo ORDER BY (SELECT NULL))/2 FROM Islands a CROSS APPLY (VALUES (StartPkgNo-1),(EndPkgNo+1)) b(PkgNo)) a GROUP BY ConsignmentNo, rn HAVING COUNT(PkgNo) = 2) -- We use correlated subqueries with FOR XML PATH to show how Islands and Gaps -- can be created at the same time. SELECT ConsignmentNo ,CondensedItemNos=STUFF(( SELECT ', ' + CASE -- Include either a single package or the range (hyphenated) WHEN StartPkgNo = EndPkgNo THEN CAST(StartPkgNo AS VARCHAR(5)) ELSE CAST(StartPkgNo AS VARCHAR(5)) + '-' + CAST(EndPkgNo AS VARCHAR(5)) END FROM Islands b WHERE a.ConsignmentNo = b.ConsignmentNo ORDER BY StartPkgNo FOR XML PATH('')), 1, 2, '') ,MissingPkgNos=STUFF(( SELECT ', ' + CASE -- Include either a single package or the range (hyphenated) WHEN GapStartPkgNo = GapEndPkgNo THEN CAST(GapStartPkgNo AS VARCHAR(5)) ELSE CAST(GapStartPkgNo AS VARCHAR(5)) + '-' + CAST(GapEndPkgNo AS VARCHAR(5)) END FROM Islands2Gaps b WHERE a.ConsignmentNo = b.ConsignmentNo ORDER BY GapStartPkgNo FOR XML PATH('')), 1, 2, '') FROM #ConsHistoryDelimited a; |
Note that we have reverted to the Islands CTE that draws its data directly from the #ConsignmentHistory table. The results output from this query are:
1 2 3 4 5 6 7 |
ConsignmentNo CondensedItemNos MissingPkgNos 2322 1, 3-4 2 2343 1-2, 5-6 3-4 4385 2-8, 10-16, 18-24, 26-32, 34-40, 42-48, 50 9, 17, 25, 33, 41, 49 5195 1-2, 5, 7 3-4, 6 6377 1-100 NULL 7611 1-2, 4, 6 3, 5 |
If we prefer, we can UPDATE the missing packages into our table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
-- Our Islands CTE creates the ranges of package numbers WITH Islands AS ( SELECT ConsignmentNo, StartPkgNo=MIN(PkgNo), EndPkgNo=MAX(PkgNo) FROM ( SELECT ConsignmentNo, PkgNo -- This rn represents the "staggered rows" ,rn=PkgNo-ROW_NUMBER() OVER (PARTITION BY ConsignmentNo ORDER BY PkgNo) FROM #ConsignmentHistory) a GROUP BY ConsignmentNo, rn) -- Convert the Islands to Gaps (missing) packages using CROSS APPLY VALUES ,Islands2Gaps AS ( SELECT ConsignmentNo, GapStartPkgNo=MIN(PkgNo), GapEndPkgNo=MAX(PkgNo) FROM ( SELECT ConsignmentNo, PkgNo ,rn=ROW_NUMBER() OVER (PARTITION BY ConsignmentNo ORDER BY (SELECT NULL))/2 FROM Islands a CROSS APPLY (VALUES (StartPkgNo-1),(EndPkgNo+1)) b(PkgNo)) a GROUP BY ConsignmentNo, rn HAVING COUNT(PkgNo) = 2) -- Update the missing package numbers column of our #ConsHistoryDelimited table UPDATE a SET MissingPkgNos=STUFF(( SELECT ', ' + CASE -- Include either a single package or the range (hyphenated) WHEN GapStartPkgNo = GapEndPkgNo THEN CAST(GapStartPkgNo AS VARCHAR(5)) ELSE CAST(GapStartPkgNo AS VARCHAR(5)) + '-' + CAST(GapEndPkgNo AS VARCHAR(5)) END FROM Islands2Gaps b WHERE a.ConsignmentNo = b.ConsignmentNo ORDER BY GapStartPkgNo FOR XML PATH('')), 1, 2, '') FROM #ConsHistoryDelimited a; SELECT ConsignmentNo, MissingPkgNos FROM #ConsHistoryDelimited; |
Exploding the Condensed Delimited List
To explode a condensed, delimited list is to simply expand out any of the entries that include a hyphen. To do this, we’ll once again use DelimitedSplit8K (to split on comma separators) and the familiar concept of a Tally table. Pretty much any Tally table will do, whether it is a true table in your database or an in-line Tally table like I’ve done. To keep performance under control when using an in-line Tally table, make sure you don’t generate any more tally rows than you need, so check what I’ve done with TOP.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
-- Exploding a condensed, delimited list WITH Tally (n) AS ( -- Restrict the rows in our 0-based Tally table to only what we need using TOP SELECT 0 UNION ALL SELECT TOP (( SELECT MAX(EndPkgNo)-MIN(StartPkgNo) FROM #ConsHistoryDelimited) ) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM sys.all_columns a CROSS JOIN sys.all_columns b) SELECT ConsignmentNo, PkgNo=n FROM #ConsHistoryDelimited a -- First CA to split the condensed list on the comma delimiter CROSS APPLY dbo.DelimitedSplit8K(CondensedPkgNos, ',') b -- Now retrieve the range or make the single entry into a range CROSS APPLY ( SELECT StartPkgNo=LEFT(Item, CHARINDEX('-', Item + '-') - 1) ,EndPkgNo=CASE CHARINDEX('-', Item) WHEN 0 THEN Item ELSE RIGHT(Item, LEN(Item) - CHARINDEX('-', Item)) END ) c -- Apply our Tally table to the range (the resulting n is our PkgNo) CROSS APPLY ( SELECT n=n+c.StartPkgNo FROM Tally WHERE n+c.StartPkgNo BETWEEN c.StartPkgNo AND c.EndPkgNo) d ORDER BY ConsignmentNo, n; |
You didn’t really think I was going to try something fancy and inefficient there, now did you? This works exactly the same whether your list is the islands or the gaps (change CondensedPkgNos to MissingPkgNos as the argument to DelimitedSplit8K if you don’t believe me).
The Final Word
Displaying a condensed delimited list can be most useful in the presentation layer of an application, particularly in cases where you may have long integer lists that you need to fit into just a small bit of screen real estate. While we don’t necessarily recommend storing them, there are ways that they can be handled and their manipulation is relatively simple.
In this article, we’ve combined the following concepts in various ways throughout these examples:
- Creating a delimited list by using FOR XML PATH in a correlated subquery
- Splitting a delimited list by using DelimitedSplit8K
- Cascading CROSS APPLYs to create intermediate calculated values
- Calculating islands (ranges) of contiguous sequence numbers
- Using the islands to calculate gaps in (or missing) sequence numbers
- Using an in-line Tally table with rows limited to only what is needed
While I am no teacher, I do know that the best way you, my valued readers, can learn is by throwing out a challenge problem to which you can apply the skills you learned in this article. So here it is:
Convert the condensed, delimited list of missing (gaps between) package numbers to the included (islands) of package numbers. The challenge is to use the CROSS APPLY VALUES method for converting Gaps to Islands that is described here to make it happen. In that method, you must know StartPkgNo and EndPkgNo, and that is why they’ve been provided in the table for you.
Post your solution (or any variant on the challenge that suits you) to the discussion thread to show you’ve done your homework. If I don’t hear back from anybody in a reasonable period of time, I’ll post my solution to the challenge.
As always, thanks for your attention and I hope you found this article to be informative.
Load comments